Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | a | 26 | po |
2 | na | 27 | ale |
3 | v | 28 | k |
4 | sa | 29 | som |
5 | je | 30 | pri |
6 | s | 31 | vo |
7 | aj | 32 | m2 |
8 | z | 33 | dom |
9 | do | 34 | Ponúkame |
10 | pre | 35 | ktorý |
11 | ako | 36 | so |
12 | o | 37 | tak |
13 | sú | 38 | už |
14 | V | 39 | by |
15 | to | 40 | len |
16 | že | 41 | Byt |
17 | si | 42 | – |
18 | nachádza | 43 | sme |
19 | alebo | 44 | čo |
20 | od | 45 | ich |
21 | predaj | 46 | nie |
22 | Na | 47 | veľmi |
23 | má | 48 | roku |
24 | za | 49 | domu |
25 | ktoré | 50 | ktorá |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges